# Analysis Pipeline: Tables and Figures

This directory contains the Stata (`.do`) files required to produce the main empirical results, summary statistics, and robustness checks for the paper.

## 1. Directory Structure

## Core Analysis & Tables
* **`00.sumstats_table1.do`**
	* **Purpose:**: Generates Panel A of Table 1. 
	* **Details:**: It calculates means and standard deviations for demographics and earnings, disaggregated by "Top School" status.
* **`01.reg_table2.do`**
	* **Purpose:**: Produces the main results for Table 2. 
	* **Details:**: It runs OLS and Quantile Regressions (25th, 50th, 75th percentiles) comparing log earnings across institutional tiers.
* **`02.regs_figure4.do`**
	* **Purpose:**: Estimates the state-by-cohort-by-CIP cells used to generate Figure 4, focusing on the relationship between missingness and bias.
* **`02.regs_table4.do`**
    * **Purpose:** Generates the main estimates for **Table 4**.
    * **Details:** Runs regressions (OLS and Quantile) for earnings outcomes 1, 5, and 10 years after graduation. It calculates the coefficients for "flagship" or "top" schools across national and in-state categories to determine the bias.
* **`02.regs_table_a4.do`**
    * **Purpose:** Generates **Table A4: Bias By Residency**.
    * **Details:** Investigates whether the bias in earnings estimates differs for students who were classified as residents versus non-residents during their studies.
* **`06.01.leave_one_out.do`**
    * **Purpose:** Robustness check.
    * **Details:** Performs a leave-one-out cross-validation by iteratively excluding one state at a time from the regression to ensure the results are not driven by a single outlier state.


## Figures & Visualizations

* **`03.cipgraph_figure4.do` (and `06.02.cip_graphs.do`)**
    * **Purpose:** Generates **Figure 4**, showing bias by academic major (CIP codes).
    * **Details:** Maps 2-digit CIP codes to readable labels (e.g., Engineering, Biology, Business) and calculates the earnings bias for each field of study.
* **`02.regs_figure5.do`**
    * **Purpose:** Produces state-by-cohort-by-year estimates.
    * **Details:** This program iterates through specific states (CO, NY, OH, PA, TX, etc.) and graduation cohorts to calculate the "bias" (the difference between national and in-state earnings coefficients) and the "missingness" rate.
* **`03.biasgraph_figure5.do` (and `06.03.bias_graphs.do`)**
    * **Purpose:** Visualizes the results from Figure 5 regressions.
    * **Details:** Creates scatter plots and regressions of the calculated bias against the difference in rates of "no in-state earnings" (missing data) across different states.


### Robustness & Appendix Scripts
* **`02.06.leebounds.do`**: Implements the Lee (2009) bounding procedure to account for non-random selection into in-state employment (Table A8).
* **`02.regs_byres_table_a4.do`**: Estimates the college premium by residency status (Table A4) to test for baseline differences between in-state and out-of-state students.
* **`00.terminalruns.do`**: Analyzes "terminal runs" (periods of zero earnings) to assess the extent of permanent exit from the labor data.
* **`02.02.01.regs_sandwich.do`**: Estimates the "sandwich" models discussed in the methodology section (robustness check for earnings measurement).

### Shared Dependencies (Helper Files)
* **`restrictions.do`**: A standardized script called by all analysis files to ensure consistent sample selection (e.g., age 18–65, specific graduation cohorts 2001–2013).
* **`top_school.do`**: Manually defines "Top Schools" using specific OPEID codes for public flagship and elite institutions.

---

## 2. Requirements & Setup

* **Software**: Stata 16 or later.
* **Required Package**: `leebounds` (Install via `ssc install leebounds`).
* **Environment**: Ensure `config.do` is present in your working directory to define the following global macros:
    * `$datadir`: Path to `all_earnings_long.dta`.
    * `$supportdir`: Destination for exported `.dta` and `.csv` results.

